Automatic Translation of Scholarly Terms into Patent Terms Using Synonym Extraction Techniques

نویسندگان

  • Hidetsugu Nanba
  • Toshiyuki Takezawa
  • Kiyoko Uchiyama
  • Akiko Aizawa
چکیده

Retrieving research papers and patents is important for any researcher assessing the scope of a field with high industrial relevance. However, the terms used in patents are often more abstract or creative than those used in research papers, because they are intended to widen the scope of claims. Therefore, a method is required for translating scholarly terms into patent terms. In this paper, we propose six methods for translating scholarly terms into patent terms using two synonym extraction methods: a statistical machine translation (SMT)-based method and a distributional similarity (DS)-based method. We conducted experiments to confirm the effectiveness of our method using the dataset of the Patent Mining Task from the NTCIR-7 Workshop. The aim of the task was to classify Japanese language research papers (pairs of titles and abstracts) using the IPC system at the subclass (third level), main group (fourth level), and subgroup (the fifth and most detailed level). The results showed that an SMT-based method (SMT_ABST+IDF) performed best at the subgroup level, whereas a DS-based method (DS+IDF) performed best at the subclass level.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Japanese Term Extraction Using Dictionary Hierarchy and Machine Translation System

There have been many studies of automatic term recognition (ATR) and they have achieved good results. However, they focus on a mono-lingual term extraction method. Therefore, it is difficult to extract terms from documents in foreign languages. This paper describes an automatic term extraction method from documents in foreign languages using a machine translation system. In our method, we trans...

متن کامل

Improving Patent Translation using Bilingual Term Extraction and Re-tokenization for Chinese-Japanese

Unlike European languages, many Asian languages like Chinese and Japanese do not have typographic boundaries in written system. Word segmentation (tokenization) that break sentences down into individual words (tokens) is normally treated as the first step for machine translation (MT). For Chinese and Japanese, different rules and segmentation tools lead different segmentation results in differe...

متن کامل

Classification of Research Papers into a Patent Classification System Using Two Translation Models

Classifying research papers into patent classification systems enables an exhaustive and effective invalidity search, prior art search, and technical trend analysis. However, it is very costly to classify research papers manually. Therefore, we have studied automatic classification of research papers into a patent classification system. To classify research papers into patent classification sys...

متن کامل

Computational Terminology: Exploring Bilingual and Monolingual Term Extraction

Terminologies are becomingmore important tomodern day society as technology and science continue to grow at an accelerating rate in a globalized environment. Agreeing upon which terms should be used to represent which concepts and how those terms should be translated into different languages is important if wewish to be able to communicate with as little confusion and misunderstandings as possi...

متن کامل

Automatic Query Expansion for Patent Passage Retrieval using Paradigmatic and Syntagmatic Information

Patent text is a mixture of legal terms and domain specific terms. Patent writers tend to paraphrase standard terminology with hypernym, hyponym and synonym substitutions in order to avoid narrowing the scope of the patent invention. The practice of paraphrasing affects the exact match retrieval function negatively. There have been many success stories addressing vocabulary mismatching using ps...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012